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ABSTRACT 

This paper presents a 1 .2 kbps speech coder based on the MELP 
analysis algorithm. In the proposed coder, the MELP parameters 
of three consecutive frames are grouped into a superframe and 
jointly quantised to obtain a high coding efficiency. The inter- 
frame redundancy is exploited with distinct quantization 
schemes for different unvoiced/voiced (U/V) frame comlv 
inations in the supcrfranw. Novel techniques for improving 
performance make use of the superframe structure. These 
include pitch vector quantization using pitch differentials, joint 
quantization of pitch and UA' decisions and LSF quantization 
with a fonvard-backward interpolation method. Subjective test 
results indicate thai the 1.2 kbps speech coder achieves 
approximately the same quality as the proposed federal standard 
2.4 kbps MELP coder. 

1. INTRODUCTION 

Speech coding ai 2.4 kbps and below is important for several 
applications, such as secure voice and satclUte communications. 
For secure voice communications, the traditional 2.4 kbps Linear 
Predicii ve Coding (LPC) vocoders were developed 20 to 30 years 
ago and a particular version, LPCIO, was adopted as a federal 
standard. In March 1996, the US government's Digital Voice 
Processing Consortium (DDVPQ selected the Z4 kbps Mixed 
Excitation Linear Prediction (MELP) [l] speech-coding 
algorithm to be a new standard for narrow band secure voice 
coding products and applications. The MELP algorithm leads to 
a significant improvcnwnl in both speech quality and 
intelligibility compared to the' LPClO algorithm. However, in 
some difficult radio channels, robust 2,4 kbps transmission is not 
always possible. To accommtodaie such adverse transmission 
conditions, a lower bit rate is required Previously, inter-frame 
redundancy was exploited with a superframe strutture applied to 
LPCIO with an eight-frame superframe struaure [4]. Recently, 
improvements lo the modeling and quantization of the original 
MELP algorithm were shown to increase quality while allowing 
a reduced rate of 1 .7 kbps 15]. 

In this paper, a new 1.2 kbps speech coder is proposed. The 
proposed coder, called Multi-Frame MELP or MF-MELP, shares 
the core analysis algorithm with the 14 kbps MELP standard, 
and its transmitted parameters are the same as those of the 2.4 
kbps MELP coder. In the proposed coder, the MELP paranwtcrs 



of three consecutive frames are grouped toget>»cr into a 
superframe and jointly quantized to obtain high coding 
cffiriency. To take advantage of such a long frame struaure, 
novel quantization schemes arc introduced for the superframe. 
The proposed coding algorithm is described in this paper and the 
results of subjective tests are reported. These results show that 
the substantia] redundancy that exists across three speech frames 
is sufficienl to achieve roughly the same quality as the 2.4 kbps 
MELP at hal f the bit-rate. 

2. CODER OVERVIEW 

The MF-MELP coder is based on the MELP analysis algorithm. 
The MELP transmitted parameters are extracted every 22.5 ms 
frame (or 180 samples of speech at a sampling rate of 8 kHz). A 
superframe structure of length 67.5 nts comprising three 
consecutive frames is adopted in the proposed coder. The MELP 
parameters for each frame in the superframe are jointly quantized 
to obtain high coding efficiency. A pitch smoother is 
incorporated to avoid large pitch errors, and this results in an 
increase in the look-ahead by 129 samples. The total algorithmic 
delay for MF-MELP is 1 03.75 ms. 

The quantization schemes of the MF-MELP are designed so that 
the superframe structure is efficiently exploited by using vector 
quantization (VQ) and interpolation. The statistical properties of 
voiced (V) and unvoiced (U) speech are also taken into account. 
Each superframe is categorized into one of several coding states 
with a different bit allocation for each state. State selection is 
done according to the U/V pattern of the superframe. Moreover, 
since an incorrect state identification by the decoder causes a 
serious degradation in the synthesized speech, the MF-MELP 
utilizes several techniques for reducing the effect of state 
mismatch between encoder and decoder due to channel errors. 

3. QUANTIZATION OF PITCH AND UA^ 
DECISIONS 

3.1 Pitch Qoantizatlon 

The pitch informaiiorr is transmitted only for voiced frames. 
Difrerent pitch quantization schemes are used for different UA^ 
combinations in the superframe. Within those superfraroes where 
the voicing pattern contains either two or three voiced frames, the 
pitch parameters are vector-quantized. For voicing patterns 
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containing only one voiced frame, the scalar quanlizcr used in the 
MELP standard is applied for the pilch of the voiced frame. For 
the UUU voicing pattern, no pilch information is transmitted. 

The pitch values, P, (/= 1 .2,3), obtained from ihe pitch analysis 
arc transformed into logarithmic values, p, = log /* , prior lo 
quantization. For each superframe, a pilch vector is constructed 
with components equal to the log pitch value for each voiced 
frame and a zero value for each unvoiced frame. For voidng 
patterns with two or three voiced frames, the pitch vector is 
quantized using a VQ algorithm with a new distortion measure. 
This distortion measure incorporates pitch differemials into the 
codebook search, which makes it possible to consider the time 
evolution of the pitch. Tliis feature is nKHivated by the perceptual 
importance of adequately tracking the pitch trajectory. 

The pitch VQ algorithm has three steps for obtaining the best 
index: 

Step I : Select the M-best candidates using the weighted squared 
Euclidean distance measure: 



Table 2. Joint quantization of pitch and UA^ decisions. 



& O " 



where the weighting coefficient is defined by 
for voiced frame 
for unvoiced frame 
and p, and p^ arc the unquaniized and quantized log pitch 
values, respeciivcly. The above equation indicates that only 
voiced frames are taken into account in the codebook search. 
Step 2 : Calculate the differentials of the onquantized log pitch 
values using 



if ( - ih and {/ - 0 - th frames arc voiced 
otherwise 



for / = 1 , 2, 3, where p. is the last log pitch value of the previous 
superframe. For the pitch candidates selected in step I, calculate 
the quantized differentials by replacing Ap, and by and 
p. respecdvely in the equation above, where is the quantized 
version of . 

Step 3 : Select the optimum index from the M*bcsi candidates 
that minimizes 

Where <^ is a parameter to control the contribution of pitch 
differentials which is set to be 1 in the proposed coder. 
For superframcs that contain only one voiced frame, scalar 
quantization of the pitch is performed. The pitch value is 
quantized on a logarithmic scale with a 99-Ievel uniform 
quantizer ranging from 20 to 160 samples. The quantizer is the 
same as that in the Z4 kbps MELP standard, where the 99 levels 
are mapped to a T-bii pilch codeword and the 28 unused 



patterns 


3-bil 
CB 




UUU 


000 


The pitch value is quantized with the 

e^nt* OQ ttniflnmn niiATiti?.^ 3^ the 

same y^'icvei uniiunn ijuaiiiiMJ ot> ^iiw 
2.4kb^s standard. The pitch value and 
U/V pattern are then mapped to this 9- 
bit codebook. 


uuv 


uvu 


vuu 


vvu 


001 


These UA^ patterns share the same 
codebook containing 512 codevectors 
of the pitch triple. 


vuv 


010 


uvv 


100 


vvv 


oil 


512-level codebook A 


101 


5 1 2.1evel codebook B 


110 


512-level codebook C 


111 


512-level codebook D 



codewords with Hamming weight I or 2 are reserved for eror 
protection. 

3.2 Joint Quantization of Pitch and UA^Dedsions 

The UA^ decisions and pitch parameters for each superframe are 
jointly qitantized using 12 bits. The joint quantization scheme is 
summarized in Table 2. In this scheme, the allocation of 12-bits 
consists of 3 mode bits (representing the 8 possible combinations 
of UA' decisions for the 3 frames in a superframe) and the 
remaining 9 bits for pilch values. The scheme employs six 
separate pitch codcbooks, five having 9 bits (i.e. 512 entries 
each) and one being the scalar quantizen the specific codebook is 
determined according to the bit paiicms of the 3-bil codeword 
representing the quantized voicing panem. Therefore the U/V 
voicing pattern is first encoded into a 3-bii codeword, which is 
then used to select one of the 6 codebooks shown. The set of 3 
pitch values is vector-quantized with ihc selected codebook to 
generate a 9-bit codeword that identifies the quantized set of 3 
pitch values. Note that four codebooks are assigned to the 
superfnimes in the VVV mode, which means that the pitch 
vectors in the VVV-iype superframcs are quantized by one of 
2048 codewords. If the number of voiced frames in the 
superframe is not larger than one, the 3-bit codeword is set to 000 
and the distinction between different modes is determined within 
the 9-bit codebook. Note that the latter case consists of the 4 
modes UUU, VUU, UVU, and UUV. In this case, the 9 available 
bits arc more than sufTicieni to represent the mode information as 
well as ihe pitch value since there arc 3 modes with 128 pilch 
values and one mode with no pitch value. 

4. LSF QUANTIZATION 
4.1 Quantization Procedure 

Table 3 shows the bit allocation for quantizing the line spectral 
frequencies (LSFs). In the table, the original LSF vectors for the 
three frames are denoted by Z^, and /j . For the UUU, UUV, 
UVU and VUU modes, the LSF vectors of unvoiced frames are 
quantized using a 9-bit codebook, while the LSF vector of the 
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Table 2. Bit aliocation for LSF quanOzation. 



pauem 


LSF/| 


LSF/j 


LSF/j 


Inter. 
Cocf. 


KCS. 


TO 

tal 


uuu 


9 


9 


9 


0 


0 


27 


vuu 


7666 


9 


9 


0 


0 


43 


uvu 


9 


7666 


9 


0 


0 


43 


uuv 


9 


9 


7666 


0 


0 


43 


uw 
vuv 
vw 




0 


7666 


4 


86 


43 


vvu 




0 


9 


4 


8666 


39 



Tabic 3. Bit allocation of 1 J. kbps MF-MELP coder 



for a soperf rame of 674 ms. 



Parameters 


U/V patterns of superframe 


VW 


UVV 

vuv 


VVU 


UUV 
UVU 
VUU 


UUU 


Pitch &. Global 
UV Decisions 


12 


12 


12 


12 


12 


LSFs 


43 


43 


39 


43 


27 


Gains 


10 


10 


10 


10 


10 


Bandpass 
Voicing 


6 


4 


4 


2 


0 


Fourier 
Magnimdes 


8 


8 


8 


8 


0 


Aperiodic Flag 




1 


I 


1 


0 


Synchronizaiion 


1 


1 


1 


1 


1 


Error Protection 


0 


2 


6 


4 


31 


Total 


81 


81 


81 


81 


81 



voiced frame is quantized with the same 25-bit multi-stage VQ 
(MSVQ) quantizer as in the MELP standard. 

The LSF vectors for the other UA^ patterns are encoded using a 
forward-backward interpolation scheme. This scheme works as 
follows. First the LSFs of the last frame in the current 
superframe. /j . are quantized to fj using the 9-bii codebook for 
unvoiced case or the same 25-bii MSVQ codebook as in the 
MELP coder for voiced case. Predicted values of /, and /j arc 

then obtained by interpolating f^and as follows (/^ is the , 
quantized LSFs of the last franw of the previous superframe); 

^(y) = fl2(y)^0')+D-«jO')1^30') > = i."n10 

where OyiJ) and a^ij) are the interpolation coefficients, and 
f.(j) is the y-th component of /j , The coefficients are stored in a 



codebook and the best set of the coefficients are selected by 
minimizing the distortion measure: 

where Wf(j) are the weighting coefficients obtained with the 

same procedure as in the 2.4 kbps MELP standard. . After 
obtaining the best interpolation coefncients, the residual LSF 
vector for frames 1 and 2 are computed by 

r20*) = /,0')-^0') i = l.".I0. 

The two residual vectors are concatenated and the resulting 20- 
dimension residual vector is encoded with a MSVQ quantizer. 

4^ Design Method for Interpolation Codebook 

The interpolation codebook is designed using the generalized 
Uoyd algorithm. In the training algorithm, two procedures are 
alternately performed. The first procedure encodes LSFs vectors 
of training database using the distortion measure E. The second 
procedure optimizes the interpolation codebook in such a way 
that the summation of all the superframe distortions is 
minimized. By setting the partial derivatives of the summation 
of the distortions with respect to a^{j) and Oj(y)to zero, the 
optimum interpolation coefficients are obtained from 

«^0-) = ^ T 

m 

a^u)=~ —7 — : — i 

for y=l 10. The interpolation coefficients codebook was 

trained and tested for several codetx>ok sizes. A codebook with 
16 entries was found to be quite efficient. 

S. BIT ALLOCATION 

The bit allocation of the t.2 kb/s coder is summarized in Table 3. 
in the coder, two gain parameters are calculated per frame, with 6 
gains per superframe. The 6 gain parameters are vector-quantized 
in logarithmic domain using a 10-bit codebook. 

The binary voicing decisions for 5 bands are obtained per frame. 
The bandpass information for the lowest. band is determined from 
the UA^ decision. The bandpass decisions of the remaining 4 
bands are employed only for voiced frames and quantized with a 
2-bit codebook. 

The Fourier magnitude vector is computed only for voiced 
frames. The vector of the last voiced frame in the current 
superframe is quantized with the same 8-bit quantizer as the 
MELP standard. The Fourier magnitude., yectofs for the other 
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Table 4. DRT testing results. 



Table 5. DAM tcsttng results. 



Description 


DRT Scores 


Quiei 


HMMWV 


0.5% BER 


2.4 kbps MELP 


93.8 


72,5 


93.4 


1.2kbpsMF-MELP 


92.1 


70.2 


Z63 



voiced frames are reconstructed using the (juantized vectors of 
the cunent and previous superframes. TTic reconstruoion 
proceduie uses either an interpolation or a repeuuon method 
according to the U/V decisions. 

Tlie ^odic flag is also obtained only from voiced frames, m 
MF-MEU» coder uses 1-bit per superfiame for the quwitiation 
of the aperiodic fUg. Different 1-bit coddwks are employed for 
different U/V patterns. 

6. TEST RESULTS 

Two subjective tests were conducted to evaluate the perfonnance 
of the MF-MELP speech coder. One is the Diagnosttc Rhyme 
Test (DRT) for measuring speech intelligibility, and another is 
L Wagno tic AcceptabiUty Measure (DAM) for speech qual.ty. 
Z a comparison purpose, the 2.4 Icbps MELP ^f^^f 
included in both tests, m 1.2 and i4 kbps coders were tested m 
ouiei background, HMMWV noise environmertt and 0.5% 
™dom bit^or channel. Note that the HMMWV .s a heaj7- 
dmy four-wheeled drive vehide used for troop «"'?SP<"V 
noise pre-processor proposed in Ref. [3] is integrated into the 1.2 
and 2.4 kbps coders. 

The DRT and DAM test results are shown in Table 4 and 5, 
rKnectivelv It is shown that, in quiet and noise envifonmenis, 
Ste MF-MH-P coder provides comparable intelligibility to the 
2 4 kbps MELP coder. Although the proposed codw obtains 
lower DAM scores than the MELP standard, the qualny of the 
1.2 kb/s coder is reasonably close to that of the 2.4 kbps MELf 
coder. 

For the channel error condhion, the difference in »n«"i8fbiliiy 
score between the 1.2 and 2.4 kbps coders is larger than that in 
other conditions. An efficient way for 
against transmission errors is to use a paniy bnjw the3- 
bU modecodebook in Table 1. Another ven-on of Ae MF^E^ 
coder with the parity check was designed, in which, to save l-b« 
for the parity check, the voiced LSF Pf^^.^'^ 
with a 24.«t MSVQ codebook instead of •he 25-b.t MSVQ 
codebook. The MF-MELP coder with the panty dieck .s found to 
improve the score by 3.7 in the channel error condition, while the 
score for clean channel (quiet background) degrades by 1.8. 

7. CONCLUSIONS 

niis paper has introduced a 1.2 kbps speech coder. MF-MBS. 
based « the MELP analysis algorithm. In the proposed coder, 
the MELP transmitted parameters of consecutive three frames are 
quantized together. Efficient vector quamiaition schetws we 
STloyed depending or the different U/V decisions for the 
sSSrframe. tkWng into account statistical properties of vo««l 



Description 


DAM Scores 


Quiet 


HMMWV 


2.4 kbps MELP 


68.2 


52.2 


1.2 kbps MF-MELP 


61.8 


48.9 



and unvoiced speech. The MF-MELP coder incorporates novel 
techniques for improving the performance, such as pitch 
quantization with pitch differentials, joint quantization of pitch 
and UA^ decisions and LSF quantization with a forward- 
backward interpolation method. The DAM and DRT test resulu 
have indicated that the MF-MELP coder has approximately the 
same speech quality as the 2.4 kbps MELP standard. 
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